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Abstract 

Recent years have seen an unprecedented rise of the role that technology plays in all 
aspects of human activities. Unavoidably, technology has heavily entered the Capital 
Markets trading space, to the extent that all major exchanges are now trading exclusively 
using electronic platforms. The ultra fast speed of information processing, order place¬ 
ment, and cancelling generates new dynamics which is still not completely deciphered. 
Analyzing a large dataset of stocks traded on the US markets, our study evidences that 
since 2001 the level of synchronization of large price movements across assets has signifi¬ 
cantly increased. Even though the total number of over-threshold events has diminished 
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in recent years, when an event occurs, the average number of assets swinging together has 
increased. Quite unexpectedly, only a minor fraction of these events - regularly less than 
40% along all years - can be connected with the release of pre-announced macroeconomic 
news. We also document that the larger is the level of sistemicity of an event, the larger 
is the probability - and degree of sistemicity - that a new event will occur in the near 
future. This opens the way to the intriguing idea that systemic events emerge as an ef¬ 
fect of a purely endogenous mechanism. Consistently, we present a high-dimensional, yet 
parsimonious, model based on a class of self- and cross-exciting processes, termed Hawkes 
processes, which reconciles the modeling effort with the empirical evidence. 


1 Introduction 

Quoting from Michael Lewis’ Flash Boys “The world clings to its old mental picture of the stock 
market because it’s comforting” [1]. But trading activity has profoundly changed from the old 
phone conversation or click and trade on a screen to software programming. Market statistics 
conhrm that automated algorithms carry out a signihcant fraction of the trading activity on US 
and Europe electronic exchanges [2, 3]. As algos feed on hnancial and news data, the speed of 
information processing has dramatically increased and potentially allows large price movements 
to propagate very rapidly through different assets and exchanges [4]. 

The synchronization effect had its most spectacular appearance during the May 6th, 2010 
Flash Crash. The crash started from a rapid price decline in the E-Mini S&P 500 market and in 
a very short time the anomaly became systemic and the shock propagated towards ETFs, stock 
indices and their components, and derivatives [5, 6]. The price of the Dow Jones Industrial 
Average plunged by 9% in less than 5 minutes but recovered the pre-shock level in the next 15 
minutes of trading. The SEC reported that such a swing was sparked by an algorithm executing 
a sell order placed by a large mutual fund. Then high frequency traders, even though did not 
ignited the event, caused a “hot potato” effect amplifying the crash. In the aftermath of the 
crash, several studies have focused on events, evocatively named Mini Flash Crashes, concerned 
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with the emergence of large price movements of an asset in a very limited fraction of time and 
attributing their origin to the interaction between several automatic algorithms [7] or to the 
unexpected product of regulation framework and market fragmentation [8]. 

The Flash Crash, however, has also dramatically shown how strongly interconnected dif¬ 
ferent markets and asset classes can become, especially during extreme events. In this paper, 
by taking a different, yet complementary approach to the above literature, we investigate how 
the frequency of collective instabilities at high frequency has changed in the last years. Specif¬ 
ically, we identify one-minute extreme events as over-threshold movements. In this respect, 
our approach shares some similarities with previous works employing non-parametric tests to 
identify extreme movements, see [9, 10, 11, 12, 13]. We perform our analysis on a yearly basis 
from 2001 to 2013 on a data sample of highly liquid US equities and we identify extreme events 
affecting a sizable fraction of the investigated assets. Remarkably, very little research has been 
devoted to the investigation of this kind of systemic events. Few noticeable exceptions are [14], 
who aim at the identihcation of common large movements between the market portfolio and 
individual stocks, and [15], who investigate the tendency of large movements to arrive simul¬ 
taneously. A very recent non-parametric test of the occurrence of simultaneous jumps across 
multiple assets is discussed in [16]. Our research provides the empirical evidence that, while 
the total number of extreme movements has decreased along years, the occurrence of systemic 
events has signihcantly increased. 

To identify the possible causes of such events we compare their time occurrences with a 
database of pre-scheduled macroeconomic announcements. Since macroeconomic news can 
be expected to have a market-level influence, they represent a natural candidate to explain 
market-wide events. For instance, literature has recognized the peculiar role played by Federal 
Open Market Committee (FOMC) meetings deciding the interest rate level [17, 18]. However, 
unexpectedly, only a minor fraction (less than 40%) of events involving a large fraction of assets 
has been preceded by the release of a macro news. This evidence opens the route to the more 
intriguing hypothesis that a genuinely endogenous dynamics is taking place. To the best of our 
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knowledge, the association between extreme equity price movements and the news arrival has 
been previously investigated in [11, 19], hnding a positive association, but the results have been 
challenged in [20]. Table 11 in [15] suggests the existence of a particularly strong relationship 
between FOMC announcements and the arrival of a systemic event (dehned as an event when 
the market index jumps). However, none of the previous works performs an analysis of the 
association between news and extreme movements conditional on the level of systemicity of the 
event. 

Finally, we show that when an event affecting a signihcant fraction of assets occurs, the 
probability of a novel extreme event in the subsequent minutes increases. More interestingly, 
there is a clear evidence that the more systemic the conditioning event is, the larger the expected 
number of assets swinging synchronously in the immediate future will be. In order to repro¬ 
duce such empirical evidences, we propose a model within the class of mutually exciting point 
processes, termed Hawkes processes [21] which in recent years have experienced an increasing 
popularity in mathematical hnance and econometrics [22, 23, 24, 25, 26, 27, 28, 29, 30]. We 
present a multidimensional, yet parsimonious, Hawkes process which captures with remarkable 
realism the cross-excitation affecting over-threshold events. 

2 Data 

Financial data. We conduct our analysis on price time series of hnancial stocks belonging 
to the Russell 3000 Index, traded in the US equity markets (mostly NYSE and NASDAQ). 

We consider the thirteen years from 2001 to 2013 and for each year we select 140 highly liquid 
stocks. We use 1-minute closing price data during the regular US trading session, i.e. from 9:30 
a.m. to 4:00 p.m and, as explained in the Support Information, we remove the intraday pattern 
of volatility, which is a local measure of the diffusion rate of price. 

News data. We use macroeconomic news data provided by Econoday, Inc. www. econoday. com. 
We consider the 42 most important news categories, which are classihed into two large groups 
according to their capacity of influencing the hnancial markets: the Market Moving Indicator 
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group and the Merit Extra Attention group. Since we are concerned with matching news with 
market extreme events, we consider only the 27 categories whose announcement times occur 
during the trading session. The number of total news announcements ranges from around 150 
in the hrst years to around 260 in the last years, for a total of 2,888 news. See the Support 
Information for more details. 

3 Methods 

3.1 Identification of extreme events 

In order to detect extreme variations of the stock prices Pt, we compare price returns (dehned 
as rt = \nPt/Pt-i) with an estimate of the historical spot volatility, which sets the scale of 
local price fluctuations. Specihcally, we calculate a volatility time series at as an exponential- 
moving-average version of the bipower variation (see [9, 31, 32]) of the return time series and 
we hnally say that an extreme return occurs when 


(Tt 

for a certain threshold 6. In our main analyses we take 6 = 4, but we also investigate higher 
values of the threshold, namely 6 = 6, 8, 10, in some of our descriptive statistics. 

4 Results 

The main objective of this paper is the modeling of the dynamics of synchronous large price 
variations at high frequency. We say that a stock jumps in a given one minute interval if 
condition of Eq. 1 is observed for a given 6. Here we are mostly interested in cojumps, i.e. the 
simultaneous (inside the minute) occurrence of jumps for a subset of M stocks. The quantity 
M is termed the multiplicity of the cojump, and it gives a measure of the systemic nature 
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2 < M <5 6 < M <10 • 11 < M <20 • 21 < M <40 • 41 < M <80 • 81 < M <140 



Figure 1: Time series of the cojumps detected for the dataset of 140 selected highly liquid 
stocks of the Russell 3000 Index during year 2001 (left panel) and 2013 (right panel). The size 
of the circles increases with the multiplicity of the cojump event. 

of the event. In the following we consider three questions: (i) how has the high frequency 
instability changed in the last hfteen years? (ii) what fraction of the systemic instabilities can 
be attributed to macroeconomic news? (iii) how can we model the short term dynamics of 
market instabilities? 

4.1 Historical dynamics of jumps and cojumps 

A visual representation of how instability of hnancial markets has changed in the last years is 
shown in Fig. 1, which compares the dynamics of 6 = 4 cojumps in 2001 (left panel) and 2013 
(right panel). The horizontal axis represents the trading day and the vertical axis indicates the 
hour of the day. The presence of a circle indicates the occurrence of a cojump and the color 
codihes the number of stocks simultaneously cojumping (i.e. the multiplicity). In 2001 there 
were many cojumps with low multiplicity and the high multiplicity cojumps are concentrated 
mostly at specihc hours of the day (10 a.m. and 2:15 p.m.) corresponding to the release of 
important macro announcements, such as, for example, the FOMC announcements. On the 
contrary, in 2013 we observe less low multiplicity cojumps and many more high multiplicity 
cojumps, which are quite scattered during the day. This is an indication that modern hnancial 
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Figure 2: Top left panel: Semi-log plots of the total number of minutes where we detect at 
least one jump among the 140 selected assets of the Russell 3000 Index. Curves correspond 
to four different levels of the threshold parameter 6. Top right panel: For 6 = 4, yearly time 
evolution of the fraction of minutes with at least one event of multiplicity larger than or equal 
to 2, 10, 30, 60. All values are normalised by the corresponding 2001 values. Bottom left panel: 
Yearly evolution of the percentage fraction of cojumps with multiplicity at least equal to 30 
for four different values of 6. Bottom right panel: Log-log plots of the Complementary of the 
Cumulative Distribution Function of the cojump multiplicity for seven different years. The 
panel reports the empirical evidence for a portfolio of 140 stocks, while the inset details results 
of the same analysis conducted with 700 liquid assets from Russell 3000 during years 2011 and 
2012 . 


markets have become more systemically unstable and that these instabilities are less related to 
macro news. In the following we show that this is the case with more quantitative analyses. 

First, in the top left panel of Fig. 2 we show the frequency of jumps per minute in each 
year, considering different values of 6. We observe that for all ds the number of jumps has 
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actually decreased over time. The different lines are quite parallel one to each other (especially 
for 6 > 6) indicating that the tails of the one minute return distribution remained quite stable. 
A completely different pattern emerges when we consider the dynamics of cojumps. The top 
right panel of Fig. 2 shows the frequency of cojumps of different multiplicity (normalized to 
its value in 2001). While the frequency of cojumps with any multiplicity (M > 2) has slightly 
declined, the frequency of high multiplicity cojumps has become in recent years up to 10 times 
more frequent than its value in 2001. The result is essentially unchanged when hxing the 
minimal multiplicity (e.g. M > 30) and computing the number of cojumps for different values 
of 6 (bottom left panel of Fig. 2). Clearly larger fluctuations are observed for larger values of 
M. The increase of frequency of high multiplicity events is not due to the fact that markets 
have become faster. In the Support Information we show the fraction of cojumps with M > 30 
and M > 60 at 1,..., 5 minutes. It is clear that the variability with the time window dehning 
the event is much smaller than the secular variability of the events. In fact the fraction of 
1-min cojumps with M > 30 in 2013 is signihcantly larger than the fraction of 5-min cojumps 
with M > 30 in 2001. The same is true for cojumps with M > 60. Therefore, the increase in 
synchronization is a genuine phenomenon, not explained by the increase in market speed. 

Finally, the bottom right panel of Fig. 2 shows the distribution function of the cojump 
multiplicity for different years. Despite some variation is observed across the years, a clear 
power law tail behavior is evident. This means that the probability of systemic cojumps is 
quite large. Consistently with the observations above, the tail is thicker in recent years (even 
if in 2013 we observe a slightly thinner tail). It is important to notice that the bending of the 
distributions for large multiplicity is very likely due to the hnite support of the distribution. 
Clearly for a set of N stocks the multiplicity cannot be larger than N, thus the distribution 
function is zero at M = N. To show the role of the hnite support, in the inset we show the 
multiplicity distribution function for a larger set of 700 highly liquid assets. In this case the 
power law region extends for a wider range and close to M = 700 we observe the expected 
bending of the function. The tail exponent of these distributions is close to 1.5 (similarly to 
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Figure 3: Top panel: Fraction of cojumps in 2012 with multiplicity larger than or equal to the 
value reported on the x axis for which a news occurred in the last 1, 5, 10, and 15 minutes. 
Bottom panel: Fraction of cojumps for different multiplicities M for which we observe at least 
one news in a time window of five minutes preceding the jump event. 

what observed in [33]). 

In conclusion, at the beginning of 2000’s individual jumps were more frequent and high 
frequency systemic instabilities, i.e. high multiplicity jumps, were rare and mostly concentrated 
on macro-news announcements. In recent years, on the contrary, markets display often systemic 
CO jumps and these are scattered across the trading day. 


4.2 Systemic cojumps and macroeconomic news 

The second question is what fraction of these systemic cojumps has an exogenous or an endoge¬ 
nous origin. To answer this question we study how frequently a systemic cojump is preceded by 
a scheduled macroeconomic news. It is in fact unlikely that stock idiosyncratic news affect the 
whole market. We measure how frequently a systemic cojump with multiplicity larger than M 
is preceded by a macronews in the last r = 1, 5,10,15 minutes. The top graph of Fig. 3 shows 
that only 40% of the high multiplicity cojumps are preceded by a macronews in the previous 
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15 minutes. Notice that the fractions of news-triggered systemic events in the 5, 10, and 15 
minutes time windows are very close one to each other, indicating that if a macronews triggers 
a systemic cojump, this will typically happen within 5 minutes from the news. 

For a historical perspective, the bottom graph of Fig. 3 shows that the fraction of systemic 
cojumps triggered by macroeconomic news is quite constant across the years and, even for 
large M, clearly below 50%. Thus our empirical analysis shows that a relevant portion of sys¬ 
temic cojumps is not associated with scheduled macroeconomic announcements. Idiosyncratic 
company-specihc news may play a role, but plausibly only for those events which involve a very 
limited number of assets. For high multiplicity cojump events, endogenous mechanisms are 
likely to play a determinant role. 

5 Model 

5.1 Hawkes process for multiplicity vector 

The empirical evidence of the previous section suggests that a large fraction of the dynam¬ 
ics of the systemic cojumps is unrelated to macro news and is likely endogenously generated. 
Moreover, as observed for example in the 2010 Flash Crash, market instabilities tend to prop¬ 
agate quickly to other assets, markets, or asset classes. Thus it is important to model the self- 
and cross-dependence of instabilities, considering both synchronous and lagged dependence, by 
studying whether and how systemic instabilities trigger other instabilities in the short run. 

However the estimation of the interaction among a set of 140 variables is extremely chal¬ 
lenging and some sort of hltering is needed. A hrst step in this direction was taken in [9] where 
we modeled the multivariate point process describing the jumps with a Hawkes factor model. 
Each stock is represented by a point process, each count being a jump. The coupling between 
the stocks is given by a one factor model structure, i.e. the intensity is the sum of the intensity 
of a factor and the intensity of an idiosyncratic term. Finally in order to capture the temporal 
clustering of events we assumed that both the factor and the idiosyncratic term follow a Hawkes 
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process. 

As shown in [9] this type of modeling is very effective (and parsimonious) in describing 
the pairwise properties of cojumps, i.e. the probability that two stocks jump in the same time 
interval. However when considering cojumps of M > 2 stocks, the model shows its weakness. 
An important indication is given by the distribution of multiplicities. It is possible to show that 
in the large N limit, the factor model of [9] predicts a multiplicity distribution with Gaussian 
tails, at odds with the power law behavior observed empirically in the bottom right panel of 
Fig. 2. Moreover the multiplicity of a systemic cojump is independent from the multiplicity of 
previous systemic cojumps, while the right panel of Fig. 1 shows clear temporal clusters of high 
multiplicity cojumps. 

For these reasons, in this paper we propose a new modeling approach which preserves the 
parsimony and is able to overcome the problems of the model of [9]. The idea is to model 
directly the vector of multiplicities, losing information on the identity of the cojumping stocks. 

Specihcally, we consider an iV-dimensional point process characterized by the vector of 
intensities A^. An event in the ^-component at time t means that at this time a systemic 
cojump of multiplicity i has occurred. Under this modeling assumption we know the total 
number of assets which have jumped, but we can no longer identify which companies among 
the N possible ones have moved. To model the self- and cross-excitation of cojumps we use an 
A^-dimensional Hawkes process with exponential kernels (see the Support Information for the 
dehnition and the most relevant features). In general, the model depends parametrically on the 
baseline intensity vector pL, and on the N x N matrices and /3ij of parameters characterizing 
the kernels. In order to reduce the dimension of the estimation problem from N + 2N‘^ to a 
more manageable number of unknowns, we proceed as follows. Since an important goal of our 
model is the ability to reproduce the empirical stationary distribution of the multiplicity vector, 
we assume pL = ■? 7 E[Af], where 0 < r; < 1, and E[At] proportional to the observed multiplicity 
frequencies. Interestingly, it is possible to show that 1 — r/ is the spectral radius of the kernel 
matrix and therefore it measures the fraction of intensity explained by the self- and cross- 
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excitation, while rj is the fraction explained by the baseline (exogenous) intensity. We assume 
that all the parameters f3ij which characterize the decay time of the self- and cross-excitations 
are equal to a constant value (3. Finally, we hypothesize that, for hxed i = 1,..., N, the largest 
intensity shock is ascribable to the self-exciting term a^, while the cross-exciting effects as a 
function of the distance \i—j \ between multiplicities decrease hyperbolically with a tail exponent 
7 . This means that cojumps of a given multiplicity excite with higher probability cojumps with 
similar multiplicity. To sum up, the model is completely specihed in terms of three parameters, 
7 , /d, and 7 , and the empirical expected number of events with hxed multiplicity. 

5.2 Model results 

We apply the model to the dataset of 140 stocks in 2013. In order to calibrate and test the model 
we make use of two quantities, J) and dehned in the Support Information. The 

hrst one is the probability, conditional on the realization at time t of an event with multiplicity 
at least M, of a cojump with multiplicity at least J in the interval {t,t -|- r]. It measures how 
frequently a systemic cojump triggers other systemic cojumps in the short run. The second 
quantity is the average multiplicity of the cojumps inside a time interval of length r after a 
cojump of multiplicity larger than or equal to M. It therefore measures the typical cojump 
multiplicity triggered by a cojump of multiplicity at least M. We consider here the case r = 5 
minutes. 

We use the fT^\M]J) with J = 10 and f?\M) to calibrate the model (see the Support 
Information for details) and we test it on fT^\M] J) with J = 30 and J = 60. The estimated 
parameters are rj = 0.15, (3 = 0.6, 7 = 2.65. Thus 85% of the cojump activity is explained by 
the excitation mechanism and only 15% is exogenous. The typical timescale of the memory 
is 1//3 ~ 1.67 minutes and the relatively low value of 7 indicates a strong cross-excitation 
between different multiplicities. As expected, the model effectively reproduces the stationary 
distribution of the multiplicities observed in empirical data (see Fig. 3 of the Support Informa¬ 
tion). Fig. 4 reports the quantities fT^\M]J) and f^\M) in real and simulated data. The 
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Figure 4: Top left panel: Probability that a cojump with multiplicity larger than or equal to 10 
occurs in a r = 5 minute interval following a cojump at time t with multiplicity Mt > M. Plots 
are obtained from historical and simulated data. The error bars represent standard errors. Top 
right and bottom left panels: Threshold 10 replaced by 30 and 60, respectively. Bottom right 
panel: Expected amplitude of the cojumps in a r = 5 minute interval following a cojump with 
multiplicity Mt > M. 


solid line corresponds to the empirical probabilities, the dotted line to the results from the 
Hawkes model, and as a benchmark case we also show the result of a shuffling experiment on 
the multiplicity time series (dashed line). It is evident that dropping the lagged correlations 
we obtain an unrealistic description of the multiplicity process. The Hawkes model, on the 
contrary, fits well the empirical data and therefore adequately describes the cross-excitation 
mechanism between systemic cojumps. Some discrepancies are observable for J = 60, but the 
general shape of the curve and its level are well reproduced and the Hawkes model is a huge 
improvement with respect to the benchmark case. This evidence conhrms that the larger is 
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the value of the conditioning multiplicity the greater is the probability that in the subsequent 
minutes an event with large multiplicity happens. 

6 Discussion 

By investigating a portfolio of highly liquid stocks, our research enlightens a remarkable ev¬ 
idence: Since 2001 the total number of extreme events has remarkably diminished, but the 
number of occurrences where a sizable fraction of assets jump together has increased. This 
trend is more and more pronounced as we consider events of higher and higher multiplicity. 
This evidence is a clear mark that markets are nowadays more and more interconnected and a 
strong synchronization between jumps of different assets is present. 

What are the factors responsible for the appearance of extreme movements? The cause 
can be either exogenous or endogenous. The former case is linked to the release of macro- 
economic news impacting the price dynamics, while the latter may result from unstable market 
conditions, such as a temporary lack of liquidity. Quite unexpectedly, only a minor fraction 
(up to 40%) of the cojumps involving a large number of assets can be attributed to exogenous 
news. The remaining 60% suggests that a more intriguing endogenous mechanism is taking 
place. Why has the synchronization among different assets increased through the recent years? 
We hypothesize that a major role is played by the dramatic increase of algorithmic trading. 
Thanks to the technological innovation, faster information processing is responsible for the 
more rapid propagation of large price movements through different assets. We also provide the 
evidence that highly systemic instabilities have the double effect of (i) increasing the probability 
that another systemic event takes place in the near future and (ii) increasing the degree of 
systemicity of short-term instabilities. 

The low timescale of the memory of the exciting effects and the strong persistence of the 
cross-excitation among different multiplicities support the idea that, to achieve an accurate 
description of high frequency price dynamics, we should abandon conventional modeling as¬ 
sumptions. Coherently, we propose an innovative approach to the collective behavior of assets’ 
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prices based on the Hawkes description of the multiplicity process. Our model well describes the 
short term dynamics of systemic instabilities while preserving a remarkable parsimony in the 
number of parameters. Thus, it provides a realistic description of the market behavior which 
is of prime importance from several perspectives, from trading to risk control, and market 
designing. 

A Support Information: Data 

A.l Market data 

Data are provided by Kibot, www.kibot.com. We consider the thirteen years from 2001 to 
2013 and for each year we select 140 highly liquid stocks in the Russell 3000 index. We exclude 
American Depositary Receipts, which are negotiable instruments representing ownership in 
non-US companies, since their dynamics is heavily influenced by their primary market and 
thus shows a peculiar intraday pattern. We use 1-minute closing price data during the regular 
US trading session, i.e. from 9:30 a.m. to 4:00 p.m. We discard early-closing days (typically, 
the eves of Independence Day, Thanksgiving and Christmas). Data are adjusted for splits and 
dividends. 

Intraday returns are hrst hltered for the average intraday pattern, since price fluctuations 
are known to exhibit signihcant differences in absolute size depending on the time of the day, 
showing a typical U shape with larger movements at the beginning and at the end of the 
trading day. We perform this hltering in a standard way by dividing price returns by the 
intraday pattern, which is calculated as the average, over all days, of absolute returns rescaled 
by the daily volatility. Such normalised returns no longer possess any daily regularities and 
can thus be considered a unique time series with no periodic structure. For more details please 
refer to [9]. 
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Figure 5: Yearly time evolution of the fraction of cojumps with multiplicity M > 30 (left) 
or M > 60 (right) over the total number of cojumps (M > 1) for 6 = 4 and different time 
horizons, namely 1,..., 5 minutes. 


A.2 News data 

The macronews dataset is provided by Econoday, Inc., www.econoday.com. Table 1 shows the 
number of news announcements, organized by year and news category. 


B Support Information: Dependence of systemic cojumps 
on time scale detection 

The paper mostly considers one minute (co)jumps. However one minute in 2013 is not equivalent 
to one minute in 2001 in terms of market activity. Hence it is important to test whether 
the increase in number of high multiplicity cojumps is due to the fact that in older years 
synchronization occurred on a time scale longer than one minute. To test this possibility we 
have repeated the analysis varying the time scale for jump detecttion from one to hve minutes. 
Analyses on the dynamics of cross-correlation between stocks data suggested us that the time 
scale over which stocks become correlated has decreased by a factor approximately equal to 
hve from 2001 to 2013. 

Fig. 5 shows the yearly time evolution of the fraction of cojumps with multiplicity M > 30 
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(left) or M > 60 (right) over the total number of cojumps (M > 1) for 9 = A and different time 
scales, namely 1,..., 5 minutes. Except for the hrst two years, no clear sorting of this fraction 
with the time scale is detectable, while the global secular trend has a much larger variability. 
This is particularly evident for the M > 60 case. Hence the number of high multiplicity 
one minute cojumps in 2013 is much higher than the number of high multiplicity hve minute 
cojumps in 2001, indicating that the increased speed of market activity is a minor cause of the 
increase of high multiplicity systemic cojumps in recent years. 

C Support Information: Model 

In the paper we model the point process describing the cojumps of k stocks (independently from 
their identity) as the fc-th component of a multivariate Hawkes process. These processes were 
introduced in the early Seventies [21], and have been widely employed to model earthquake 
data [34, 35, 36]. For a complete overview of the properties of Hawkes processes please refer 
to [37, 38], while for a review of their recent applications in a hnancial context see [30]. Here 
we detail how we build and estimate the model. 

C.l Multivariate Hawkes point processes 

An A^-dimensional Hawkes process is a point process characterized by the vector of intensities 
At := (a), ..., , where the Ttype intensity satishes the relation 

N 

i=i ti<t 

where /ij and i/j are positive deterministic functions for all = 1,...,A^. The set 
corresponds to the random sequence of increasing events associated with the j-component of 
the A^-dimensional point process. If /ij = /r® is a constant and the kernel function i/j reduces 
identically to zero, then the Hawkes point process describing the i-component reduces to a 
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Poisson process with constant intensity /ih On the contrary, if the kernel is positive, each time 
an event occurs for any component of the multidimensional process, the intensity AJ increases 
by a positive amount. 

C.2 Choice of the parametrization 

As in most high-dimensional problems, the estimation of multivariate Hawkes processes is 
problematic because of the large number of parameters. In order to overcome the curse of 
dimensionality problem, in this paper we choose a quite rigid parametrization of the kernel 
matrix, reducing signihcantly the number of free parameters. We also propose a method to 
estimate the model on data. 

First of all, we assume that the vector fi := (pj,..., jj,^y does not depend on time. Second, 
we consider the most common parametrization of the kernel in terms of exponential functions 

zyj(t - ti) := , 

with aij > 0 and /3ij > 0 for all i,j. The parameter aij hxes the scale of the intensity process A* 
and provides the deterministic amount by which the j-type event at ^ shocks the intensity of 
the Atype process. The parameter j3ij describes the inverse of the time needed by the process 
i to lose memory of a count of process j. 

The process is stationary if the spectral radius (i.e. the absolute value of the largest eigen¬ 
value) of the matrix T of elements 

■p _ 

A, 

is strictly smaller than one. In this case the unconditional expected intensities of the process 
reads 

E[W] = (i^-r)-V, (2) 

where is the iV-dimensional identity matrix. 

We make the following further assumptions: 
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• We assume that all the Pij are equal to a constant value > 0. This means that there is 
only one time scale characterizing the decay of the kernels. 

• We impose the condition that = ? 7 E [Aj], with 0 < 77 < 1. This means that the dis¬ 
tribution of multiplicity in the observed process is the same as the distribution of the 
multiplicity in the baseline (or ancestor) process. In other words, the cross-excitation be¬ 
tween the different components of the Hawkes process does not change the unconditional 
law of multiplicity. Notice that this assumption implies that 

rE[At] = (l-r 7 )E[Ai] , 

i.e. E [At] (or ^i) is the eigenvector of T with eigenvalue 1 — 77 . 

• The generic matrix element Tij describing the intensity of the excitation of variable j on 
variable i is the product of a term Du which depends on the excited variable and a term 
a{\i — j\) which depends on the absolute difference of the two multiplicities. Therefore 
we can rewrite T = DS, where D is a diagonal matrix of elements 

^ (1 

and T.ij = a{\i- j\). 

• Finally, we parametrize the matrix S as 

Sp = = {\i-j\ + 1)“^ 

This hyperbolic decay is chosen to model with only one parameter 7 the strong cross¬ 
excitation between two very different multiplicities. 

The model is therefore parametrized by the vector ^ and the three parameters 77, 7 , and 13. 
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Before presenting the estimation procedure, we discuss some properties of the model. As 
all the entries of F are strictly positive, the Perron-Frobenius Theorem applies. Then, there 
exists only one eigenvector with all strictly positive components, and the associated eigenvalue 
is the spectral radius. Since E [AJ] > 0 for alH = 1,..., A^, we conclude that the spectral radius 
is 1 — 77 . Incidentally, we notice that all the eigenvalues of T are real. This property readily 
follows from observing that T is the product of two symmetric matrices, and D is diagonal and 
positive dehnite. Indeed, denoting with \/D the square root of the matrix D, T is similar to 
■\/D DTiy/D, which is by construction symmetric. Moreover, if T is diagonal dominant, i.e. if 
iFiil > |rul for i = 1 ,..., A^, the eigenvalues are also strictly positive. 


C.3 Estimation of the model parameters 

A rigorous estimation of our model’s parameters through likelihood maximization poses several 
computational problems. We instead propose a heuristic and robust calibration procedure 
based on moments. In particular we consider the following two conditional expectations, whose 
values on real and simulated data are graphed in Fig. 4 of the main article: 


/W(M; J) := Pht' e (t,t + r] s.t. > J 


Mt>M 


:=E\Mt: Mt > M, 3t' e {t,t + r] s.t. M*, > 0 


(3) 

(4) 


The hrst quantity, J), is the probability of observing a systemic event with multiplicity 

at least J inside a time interval of length r after a cojump of multiplicity Mt larger than or 
equal to M. It therefore measures the probability that a cojump of multiplicity at least M 
triggers a systemic cojump (J hxes the threshold for a systemic cojump). The second quantity, 
fr (M), is the average multiplicity of the cojumps inside a time interval of length r after a 
cojump of multiplicity Mt larger than or equal to M. It therefore measures the typical cojump 
multiplicity triggered by a cojump of multiplicity at least M. 

We use fr^\M-, J) (for hxed J and r) and f^\M) (for hxed r) to estimate via a weighted 
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least squares approach the three model parameters rj, 7 , and /3. Since we are not able to compute 
analytically the moments of fT^\M] J) and f^\M) from the model, we perform Monte Carlo 
simulations with hxed parameters. Specihcally, given a multiplicity M, let the data and the 
model conditional expectations of any of the quantity in Eq. [3] and [4] be represented by their 
average values ad(M), a^{M) and standard errors 5d(M), Then, for the expectation 

fr'^ (i = 1 , 2 ) we construct the loss function 


X(i) 


Mes 


(fld ®m) 


(5) 


where the sum is taken over a set of multiplicities S. We then construct the total loss function 
xfi'j + 0 . 5 x^ 2 ) ''^6 search for the model parameters which minimize the loss function. Given 

the small number of parameters we explore a large region of the three-dimensional space of 
parameters on a 0.05-spaced grid. 


C.4 Results for the investigated dataset 

As an example of the estimation procedure and to discuss the properties of the htted model, 
we consider in detail the case of = 140 highly liquid assets of the Russell 3000 Index in 
2013. The same set is used also in Fig. 4 of the main text. We £x J = 10 in Eq. [3], r = 5 in 
Eq. [3] and [4], S' = {5,10,15,..., 65, 70} and look for the parameters that minimise the total 
loss function. Following this approach, we hnd a clear minimum corresponding to the values 
ri = 0.15, (3 = 0.6, 7 = 2.65. 

The left panel of Fig. 6 reports the logarithmic value of 140 x 140 entries of the T matrix. 
Coherently with the dehnitions given above, Tij for hxed i, is the impact of past events with 
multiplicity j on the multiplicity i. The largest value corresponds to the diagonal term Tu = Da 
and quantihes the shock of the intensity due to a self-exciting effect. Then, moving away from 
the Tjj, the kernel matrix decreases symmetrically along the row according to a hyperbolic 
scaling with tail index 7 = 2.65. The parameter 7 rescales the level of the main diagonal of 
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logio Ty 




Figure 6 : Left panel: Logarithmic entries of the matrix Fjj := aij/fiij for (5ij = (3 = 0.6 for all 
i, j = 1,..., 140, rj = 0.15, and 7 = 2.65. Right panel; Linear plot of the diagonal entries of F 
as a function of the multiplicity i. 

the matrix F, reported in the right panel of Fig. 6 , and determines the degree of stationarity 
of the process. In Fig. 7 we plot the complete spectrum of the matrix F. As expected, the 
largest value corresponds to 1 — p = 0.85, while the positive dehniteness of all the eigenvalues 
follows from the evidence, verihed numerically, that the matrix is diagonal dominant. More 
specihcally, for the chosen values of 7 , f3, and 7 the matrix F is determined uniquely through 
the specihcation of the vector of expected intensities, E [Aj]. In our numerical experiment we 
replace the vector of expected intensities multiplied by the length of the time series, i.e. 96,861, 
with the empirical frequencies observed for the 140 assets from the Russell 3000 Index in 2013. 
Fig. 8 conveys this information in terms of the Complementary of the Cumulative Distribution 
Function of the cojump multiplicities associated with the empirical data (bold line). We also 
report the same quantity measured from a synthetic time series corresponding to a Monte Carlo 
simulation of the 140-dimensional Hawkes process (dashed line). 
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Figure 7: Eigenvalue spectrum of the matrix F. The spectral radius p(F) corresponds to 
1 — r]. Since r] = 0.15, and more generally for 0 < r; < 1, the multidimensional Hawkes 
process describing the stochastic evolution of the multiplicity remains stationary. For the chosen 
parameter values, we verihed numerically that F satishes the diagonal dominant condition and 
so all its eigenvalues are strictly positive. 
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Figure 8: Log-log plot of the Complementary of the Cumulative Distribution Function of the 
cojump multiplicities. The bold line corresponds to the empirical distribution measured from 
the Russell 3000 data sample, 140 assets, during year 2013. The dashed line is the distribution 
obtained from a simulation of the multidimensional Hawkes process. The total number of 
minutes drawn from the simulation coincides with the length of the empirical time series and 
is equal to 96861. 
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Table 1: Number of news announcements, organized by year and news category. 
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“Merit Extra Attention according to the classification provided by Econoday, Inc. 
^Market Moving Indicator according to the classification provided by Econoday, Inc. 










